Major refactor to move multi-threaded load generator to async event loops by nv-alicheng · Pull Request #282 · mlcommons/endpoints

nv-alicheng · 2026-04-14T02:07:08Z

What does this PR do?

Major refactor which moves the multi-threaded system to an event loop based one to reduced Python context switching for higher throughput and performance.

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

This merges from a side branch which consists of multiple PRs:

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

github-actions · 2026-04-14T02:07:18Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist

Code Review

This pull request transitions the load generator to an asynchronous architecture and replaces the SQLite-based metrics pipeline with a high-performance, mmap-backed key-value store. It introduces a new BenchmarkSession for phase orchestration and optimized async strategies for Poisson, burst, and concurrency load patterns. The review feedback identifies a potential race condition caused by premature temporary directory deletion in the scoring artifact logic, suggests using generic placeholders for endpoint URLs to improve configuration portability, and recommends dynamically registering metric keys using enums to enhance code maintainability and prevent silent failures.

arekay-nv

Made a first pass. Will finish up tomorrow.

arekay-nv

Part 2 of reviews - next one should wrap it up

arekay-nv

Awesome work - thanks.
I will run some perf test offline, but looks good!

- docs(report_design): add "reports reproducible from event log" principle - refactor(metrics_table): rename subtract_field -> delta_start_fieldname - docs(metrics_table): reword ISL token_ids/text comments so SGLang/OpenAI are examples, not defining conditions - test(kv_store): pin empty SeriesStats min/max sentinels; add snapshot isolation regression test - test(aggregator): add explanatory messages to tracking-window asserts - test(report_builder): pin max/std_dev on empty compute_summary - test(sample_order): parametrize over [3, 100, 10_000] dataset sizes - test(zmq_pool_transport): collapse two pool transport classes into one parametrized over (num_workers, create_publisher) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses PR #282 thread T33: reframe the docstring so MetricsAggregator and Harmony read as examples rather than a closed list of consumers, and state explicitly that this function never loads or downloads the tokenizer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: address PR review comments — fix tmpfs cleanup, use metric enums, fix templates - Fix premature tmpfs deletion in _write_scoring_artifacts (cleanup now owned solely by run_benchmark's finally block) - Replace hardcoded metric key strings in _setup_kv_reader with MetricCounterKey/MetricSeriesKey enum iteration - Fix config templates: replace placeholder URLs with http://localhost:8000, remove nonexistent record_worker_events field, fix YAML indentation - Replace internal hostname in gptoss_test.yaml with localhost Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Disable warmup by default until rules are defined * Remove stray config file * Add TPOT_NS to _STREAMING_ONLY metrics set Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * chore: address second round of PR #282 review comments - docs(report_design): add "reports reproducible from event log" principle - refactor(metrics_table): rename subtract_field -> delta_start_fieldname - docs(metrics_table): reword ISL token_ids/text comments so SGLang/OpenAI are examples, not defining conditions - test(kv_store): pin empty SeriesStats min/max sentinels; add snapshot isolation regression test - test(aggregator): add explanatory messages to tracking-window asserts - test(report_builder): pin max/std_dev on empty compute_summary - test(sample_order): parametrize over [3, 100, 10_000] dataset sizes - test(zmq_pool_transport): collapse two pool transport classes into one parametrized over (num_workers, create_publisher) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: clarify _check_tokenizer_exists is a probe, consumers are examples Addresses PR #282 thread T33: reframe the docstring so MetricsAggregator and Harmony read as examples rather than a closed list of consumers, and state explicitly that this function never loads or downloads the tokenizer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Squashed combination of 8 commits developed on feat/alicheng-pubsub-integration: - Initial cleanup to remove all features that will be replaced and permanently removed (#214) - Add KVStore, ready-check mechanism, and ServiceLauncher (#215) - Add report building (#216) - Fix rebase errors - Make Loadgen Async (#255) - Do not use /dev/shm tmpfs for KVStore on ARM, fix rebase errors in e2e_oracle tests - Add batched publishing to pubsub (#281) - Disable warmup temporarily, bugfixes. (#288) Squashed locally prior to rebase onto origin/main to avoid a case-insensitive-filesystem conflict on docs/load_generator/DESIGN.md between intermediate commits. Full per-sub-PR history is preserved in the review threads on PR #282 and on the backup ref backup/pre-squash-pubsub-integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

viraatc

looks great! thanks!

Squashed combination of 8 commits developed on feat/alicheng-pubsub-integration: - Initial cleanup to remove all features that will be replaced and permanently removed (#214) - Add KVStore, ready-check mechanism, and ServiceLauncher (#215) - Add report building (#216) - Fix rebase errors - Make Loadgen Async (#255) - Do not use /dev/shm tmpfs for KVStore on ARM, fix rebase errors in e2e_oracle tests - Add batched publishing to pubsub (#281) - Disable warmup temporarily, bugfixes. (#288) Squashed locally prior to rebase onto origin/main to avoid a case-insensitive-filesystem conflict on docs/load_generator/DESIGN.md between intermediate commits. Full per-sub-PR history is preserved in the review threads on PR #282 and on the backup ref backup/pre-squash-pubsub-integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nv-alicheng requested a review from a team April 14, 2026 02:07

github-actions Bot requested review from arekay-nv and nvzhihanj April 14, 2026 02:07

github-code-quality Bot found potential problems Apr 14, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark/execute.py Outdated

Comment thread configs/gptoss_test.yaml Outdated

Comment thread src/inference_endpoint/commands/benchmark/execute.py

github-code-quality Bot found potential problems Apr 16, 2026

View reviewed changes

Comment thread src/inference_endpoint/load_generator/session.py Fixed

arekay-nv reviewed Apr 18, 2026

View reviewed changes

Comment thread configs/gptoss_test.yaml Outdated

Comment thread tests/unit/async_utils/services/metrics_aggregator/test_kv_store.py

Comment thread tests/unit/async_utils/services/metrics_aggregator/test_kv_store.py

arekay-nv reviewed Apr 19, 2026

View reviewed changes

arekay-nv approved these changes Apr 19, 2026

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark/execute.py Outdated

Comment thread src/inference_endpoint/commands/benchmark/execute.py Outdated

Comment thread src/inference_endpoint/commands/benchmark/execute.py Outdated

github-code-quality Bot found potential problems Apr 21, 2026

View reviewed changes

nv-alicheng force-pushed the feat/alicheng-pubsub-integration branch from 4c785b2 to aaf6f9b Compare April 21, 2026 20:15

viraatc reviewed Apr 22, 2026

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark/execute.py

viraatc approved these changes Apr 22, 2026

View reviewed changes

nv-alicheng and others added 2 commits April 22, 2026 12:50

Regenerate template files

d44be10

nv-alicheng force-pushed the feat/alicheng-pubsub-integration branch from aaf6f9b to d44be10 Compare April 22, 2026 20:02

nv-alicheng and others added 2 commits April 22, 2026 14:44

Use HF_TOKEN to determine whether or not to run test_template_runs

46da470

update regenerate_template script, pass ci

a056e0a

nv-alicheng merged commit 2a60f63 into main Apr 22, 2026
7 checks passed

github-actions Bot locked and limited conversation to collaborators Apr 22, 2026

Conversation

nv-alicheng commented Apr 14, 2026

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arekay-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arekay-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arekay-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

viraatc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Apr 14, 2026 •

edited

Loading